Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters

نویسندگان

Guochun Shi

Steven Gottlieb

Michael T. Showerman

چکیده

Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power efficiency, and low cost. Lattice QCD is one of the fields that has successfully adopted GPUs and scaled to hundreds of them. In this paper, we report our Cray XK6 experience in profiling and understanding performance for MILC, one of the Lattice QCD computation packages, running on multi-node Cray XK6 computers using a domain specific GPU library called QUDA. QUDA is a library for accelerating Lattice QCD computations on GPUs. It started at Boston University and has evolved into a multi-institution project. It supports multiple quark actions and has been interfaced to many applications, including MILC and Chroma. The most time consuming part of lattice QCD computation is a sparse matrix solver and QUDA supports efficient Conjugate Gradient (CG) and other solvers. By partitioning in the 4-D space time domain, the solvers in the QUDA library enable the applications to scale to hundreds of GPUs with high efficiency. The other computationally intensive components, such as link fattening, gauge force and fermion force computations, are also being ported to GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel hybrid CPU–GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks

The adoption of hybrid CPU–GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where mediumsized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit f...

متن کامل

Titan: Early experience with the Cray XK6 at Oak Ridge National Laboratory

In 2011, Oak Ridge National Laboratory began an upgrade to Jaguar to convert it from a Cray XT5 to a Cray XK6 system named Titan. This is being accomplished in two phases. The first phase, completed in early 2012, replaced all of the XT5 compute blades with XK6 compute blades, and replaced the SeaStar interconnect with Cray’s new Gemini network. Each compute node is configured with an AMD Opter...

متن کامل

Hybrid Programming and Performance for Beam Propagation Modeling

We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam propagation modeling as we move toward extreme-scale systems. Using an MPI programming interface for parallel algorithms, we expanded the capability of our existing electromagnetic solver to a hybrid (MPI/shared-memory) model that can potentially use the computer resources on future-generation co...

متن کامل

Acceleration and Verification of Virtual High-throughput Multiconformer Docking

In this chapter we give the current state of high-throughput virtual screening. We describe a case study of using a task-parallel MPI (Message Passing Interface) version of Autodock4 to run a virtual high-throughput screen of one-million compounds on the Jaguar Cray XK6 Supercomputer at Oak Ridge National Laboratory. We include a description of scripts developed to increase the efficiency of th...

متن کامل

Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite

Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the “flat” MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. Reducing MPI ta...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters

نویسندگان

چکیده

منابع مشابه

A novel hybrid CPU–GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks

Titan: Early experience with the Cray XK6 at Oak Ridge National Laboratory

Hybrid Programming and Performance for Beam Propagation Modeling

Acceleration and Verification of Virtual High-throughput Multiconformer Docking

Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite

عنوان ژورنال:

اشتراک گذاری